3574 results found.
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
From Data Center(s)
License:
sui generis, free for non-commercial use
Size:
5000 hoursProduction Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
-
Paper track:8.13 Other topics in Speech Recognition: Signal Pr/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Patrick O'Neill | SPGISpeech | /N |
Documentation:
English documentation in progress, to be made availabe at https://datasets.kensho.com/datasets/scribe
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
License:
Size:
None GByteProduction Status:
Use:
-
Paper title:Speaker-conversation factorial designs for diarization error analysis
-
Paper track:12.7 Evaluation of speech technology systems/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Scott Seyfarth | Fisher English Training Speech Part 1 Transcripts | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
2.6 GByteProduction Status:
Existing-used
Use:
Speech Synthesis
-
Paper title:LiteTTS: A Lightweight Mel-spectrogram-free Text-to-wave Synthesizer Based on Generative Adversarial Networks
-
Paper track:7.5 Towards end-to-end speech synthesis/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Kim Nguyen | LJSpeech corpus | /N |
Documentation:
Documentation is on the web page
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CC BY 4.0
Size:
1300 hoursProduction Status:
Newly created-finished
Use:
Speech Recognition/Understanding
-
Paper title:Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization
-
Paper track:12.6 Speech and multimodal resources/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Gonçal Garcés Díaz-Munío | Europarl-ASR | /N |
Documentation:
Publicly available documentation in English available at https://www.mllp.upv.es/europarl-asr/
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
European Parliament
Size:
600000000 wordsProduction Status:
Existing-used
Use:
Corpus Creation/Annotation
-
Paper title:Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization
-
Paper track:12.6 Speech and multimodal resources/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Gonçal Garcés Díaz-Munío | Europarl | /N |
Documentation:
https://www.statmt.org/europarl/
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
From Owner
License:
DCEP Usage Conditions
Size:
7 GByteProduction Status:
Existing-used
Use:
Corpus Creation/Annotation
-
Paper title:Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization
-
Paper track:12.6 Speech and multimodal resources/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Gonçal Garcés Díaz-Munío | Digital Corpus of the European Parliament | /N |
Documentation:
https://ec.europa.eu/jrc/en/language-technologies/dcep
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
None MByteProduction Status:
Existing-used
Use:
Language Modelling
-
Paper title:Revisiting Parity of Human vs. Machine Conversational Speech Transcription
-
Paper track:9.9 Other topics in Speech Recognition - Architect/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Courtney Mansfield | Fisher English Training Speech | /N |
Documentation:
Yes; English; yes
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
None MByteProduction Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Revisiting Parity of Human vs. Machine Conversational Speech Transcription
-
Paper track:9.9 Other topics in Speech Recognition - Architect/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Courtney Mansfield | Penn TreeBank 3, Switchboard corpus part | /N |
Documentation:
Yes, English, Yes
Written
Evaluation Data,
Language Type:
Monolingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Revisiting Parity of Human vs. Machine Conversational Speech Transcription
-
Paper track:9.9 Other topics in Speech Recognition - Architect/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Courtney Mansfield | 2000 HUB5 English Evaluation Speech | /N |
Documentation:
Yes, English, yes
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CreativeCommons
Size:
39 hoursProduction Status:
Newly created-in progress
Use:
Speech Recognition/Understanding
-
Paper title:Earnings-21: A Practical Benchmark for ASR in the Wild
-
Paper track:8.13 Other topics in Speech Recognition: Signal Pr/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Miguel Del Rio | Earnings-21 | /N |
Documentation:
None




